HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in Recon UI. by ArafatKhan2198 · Pull Request #10162 · apache/ozone

ArafatKhan2198 · 2026-04-30T09:18:13Z

What changes were proposed in this pull request?

The Recon UI had no way for administrators to export unhealthy container data (Missing, Under-Replicated, Over-Replicated, etc.) at scale. For clusters with millions of containers, any streaming export over a long-running HTTP connection would be killed by network infrastructure (firewalls, load balancers, proxies) before completion.

Solution: Asynchronous Background Export with Queue

Instead of streaming data directly to the browser, this PR implements a server-side background job system that:

Builds the export on the Recon node itself
Splits large exports into 500K-record CSV chunks
Archives them into a single TAR file
Lets the user download the TAR from the browser when ready

Backend Changes

New: `ExportJob` model (`ExportJob.java`)

A data class representing one export job with fields:

jobId (UUID), userId, state (container state), status (QUEUED → RUNNING → COMPLETED/FAILED)
queuePosition, totalRecords, estimatedTotal, progressPercent
filePath (path to TAR on disk), submittedAt, startedAt, completedAt, errorMessage

New: `ExportJobManager.java` — the core engine

A Guice Singleton that runs for the lifetime of the Recon server:

Single-threaded executor — one export runs at a time, eliminating concurrent Derby database access
Global queue (max 4 jobs) — incoming requests beyond the limit return HTTP 429
3-second cooldown between jobs (on the worker thread, transparent to users)
CSV splitting — every 500K records creates a new part file (e.g., part001.csv, part002.csv)
TAR archiving — all part files are archived using Archiver.create() into export_{state}_{userId}_{shortJobId}.tar
Progress tracking — runs a COUNT(*) before the cursor opens to calculate estimatedTotal; totalRecords increments live
Cleanup — temp CSV files and their directory are deleted after TAR is created
Synchronized submitJob() — prevents race conditions when multiple users submit simultaneously
getQueuePosition() — walks LinkedHashMap (insertion-order) to return 1-indexed position

`ContainerEndpoint.java` — new REST endpoints

Method	Path	Purpose
POST	/api/v1/containers/unhealthy/export	Submit a new export job
GET	/api/v1/containers/unhealthy/export	List all jobs (new)
GET	/api/v1/containers/unhealthy/export/{jobId}	Get one job's status
GET	/api/v1/containers/unhealthy/export/{jobId}/download	Stream the TAR to browser
DELETE	/api/v1/containers/unhealthy/export/{jobId}	Cancel a job

Queue-full (429) errors return JSON instead of Jetty's HTML error page.

`ContainerHealthSchemaManager.java`

Added getUnhealthyContainersCursor() — jOOQ lazy cursor for streaming DB records without holding them all in JVM heap
Added getUnhealthyContainersCount() — fast COUNT(*) used before the cursor opens for progress estimation

`ReconServerConfigKeys.java`

New config keys:

ozone.recon.export.worker.threads (default: 1)
ozone.recon.export.directory (default: /tmp/recon/exports)
ozone.recon.export.max.jobs.total (default: 10)

Frontend Changes (`containers.tsx`, `container.types.ts`)

New: Export Tab (tab key `'6'`)

A dedicated Export tab is added to the Containers page alongside Missing, Under-Replicated, etc. It contains:

Submit Controls:

Dropdown to select container state (Missing, Under-Replicated, Over-Replicated, Mis-Replicated, Replica Mismatch)
"Export CSV" button — POSTs to backend and immediately shows the job in the table below

Active Exports table (hidden when empty):

Columns: Job ID (8-char + full ID tooltip), State, Status (colored Tag), Queue Position (#1, #2...), Progress bar + record count
No pagination — always compact

Completed Exports table (always visible, paginated):

Columns: Job ID, State, Status, Records, Submitted, Started, Completed, Action
Download button (only for COMPLETED jobs) — triggers TAR file download to browser
Error message tooltip (for FAILED jobs)
Timestamps formatted as MMM D, HH:mm:ss

Polling:

3-second interval using setInterval + useRef — starts when Export tab is opened or a job is submitted
Auto-stops when no QUEUED or RUNNING jobs remain

Error handling:

429 queue-full error shows a 6-second toast with the specific message
All errors show clean messages (no raw HTML from Jetty)
Guard in fetchTabData prevents undefined API calls when Export tab is active

## What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14913

How was this patch tested?

Log Changes -


2026-04-30 15:16:48 2026-04-30 09:46:48,962 [pool-56-thread-1] INFO api.ExportJobManager: Starting export job ac16b513-f3f0-4e2d-a124-f208155697c3
2026-04-30 15:16:54 2026-04-30 09:46:54,625 [pool-56-thread-1] INFO api.ExportJobManager: Export job ac16b513-f3f0-4e2d-a124-f208155697c3 will process approximately 3040000 records
2026-04-30 15:16:54 2026-04-30 09:46:54,628 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part1
2026-04-30 15:17:28 2026-04-30 09:47:28,413 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part2
2026-04-30 15:17:57 2026-04-30 09:47:57,420 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part3
2026-04-30 15:17:58 2026-04-30 09:47:58,876 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part4
2026-04-30 15:18:00 2026-04-30 09:48:00,646 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part5
2026-04-30 15:18:02 2026-04-30 09:48:02,488 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part6
2026-04-30 15:18:04 2026-04-30 09:48:04,261 [pool-56-thread-1] INFO api.ExportJobManager: Created CSV file: part7
2026-04-30 15:18:04 2026-04-30 09:48:04,429 [pool-56-thread-1] INFO api.ExportJobManager: Export job ac16b513-f3f0-4e2d-a124-f208155697c3 wrote 3040000 records across 7 files
2026-04-30 15:18:05 2026-04-30 09:48:05,730 [pool-56-thread-1] INFO api.ExportJobManager: Created TAR archive: /tmp/recon/exports/export_missing_webui_ac16b513.tar
2026-04-30 15:18:05 2026-04-30 09:48:05,755 [pool-56-thread-1] INFO api.ExportJobManager: Deleted temporary CSV files for job ac16b513-f3f0-4e2d-a124-f208155697c3
2026-04-30 15:18:05 2026-04-30 09:48:05,755 [pool-56-thread-1] INFO api.ExportJobManager: Completed export job ac16b513-f3f0-4e2d-a124-f208155697c3 (3040000 records)

CSV_Export_Feature.mp4

… Recon UI.

devmadhuu · 2026-04-30T11:17:18Z

@ArafatKhan2198 as discussed, please design the solution server based for single Recon user. We don't have user based logins in Recon. We should not localize the logic at browser for job progress. All browser windows opened in multiple machines opening the recon page should see the same job and its progress. At a time only job should be allowed to run and remaining 2 should go in queue.

HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in…

86ea7b3

… Recon UI.

devmadhuu self-requested a review April 30, 2026 10:17

Added a progress table for existing and completed exports

5970a5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in Recon UI.#10162

HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in Recon UI.#10162
ArafatKhan2198 wants to merge 2 commits intoapache:masterfrom
ArafatKhan2198:csvExport2

ArafatKhan2198 commented Apr 30, 2026 •

edited

Loading

Uh oh!

devmadhuu commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArafatKhan2198 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Solution: Asynchronous Background Export with Queue

Backend Changes

New: ExportJob model (ExportJob.java)

New: ExportJobManager.java — the core engine

ContainerEndpoint.java — new REST endpoints

ContainerHealthSchemaManager.java

ReconServerConfigKeys.java

Frontend Changes (containers.tsx, container.types.ts)

New: Export Tab (tab key '6')

How was this patch tested?

Uh oh!

devmadhuu commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArafatKhan2198 commented Apr 30, 2026 •

edited

Loading

New: `ExportJob` model (`ExportJob.java`)

New: `ExportJobManager.java` — the core engine

`ContainerEndpoint.java` — new REST endpoints

`ContainerHealthSchemaManager.java`

`ReconServerConfigKeys.java`

Frontend Changes (`containers.tsx`, `container.types.ts`)

New: Export Tab (tab key `'6'`)